2024.7.1 RNNごく基礎（未完成）【torch】

（公式）torch.nn.RNNを使ってみる。まずは全ての引数をチェック。

code:rnn1.py

import torch as pt

import torch.nn as nn

rnn = nn.RNN(

input_size = 2, # 入力信号の特徴数

hidden_size = 3, # 隠れ層のノード数

num_layers = 1, # リカレント層の数

nonlinearity = 'tanh', # 活性化関数

bias = True, # バイアスの有無

batch_first = False,

dropout = 0.0,

bidirectional = False,

device = None,

dtype = None

)

p = list(rnn.parameters()) # (A)

(A) ジェネレータとして保持されている重み行列をリスト型として取り出している。

今回確認したのは４つの引数

input_size

hidden_size

num_layer

bias

まず、num_layer = 0ではプログラムは動作しなかった。

（実験１）input=2, hidden=3, num_layer=1, bias=False

こんなネットを構成している。

$ h(t) = \tanh (W x(t) + R h(t-1))

ちなみに、活性化関数はデフォルトでは$ \tanhである。

code:rnn1.py

import torch.nn as nn

rnn = nn.RNN(

input_size = 2, # 入力される特徴量の数

hidden_size = 3, # 隠れ層のノード数

num_layers = 1, # リカレントレイヤーの数

bias = False,

)

W = list(rnn.parameters()) # (1)

print(W)

code:result1.txt

# W_x

tensor([ 0.1411, -0.4485,

-0.4695, 0.0508,

-0.2951, -0.0127], requires_grad=True)

# W_h

tensor([ 0.4412, 0.4350, 0.1039,

-0.4392, 0.1499, -0.5452,

0.3140, -0.5007, 0.2941], requires_grad=True)

このネットは２つの重みを有する。

$ W_x: 3\times 2は$ x(t):2から隠れ層$ h(t):3へ信号の流れ

$ W_h: 3 \times 3は１ステップ前の隠れ層の出力$ h(t-1): 3から$ h(t): 3への信号の流れ

を示している。バイアスは無い。

当然ながら全てrequires_grad = True となっている。

厳密には、各重みは torch.tensor 型ではなく「torch.nn.parameter.Parameter」型である。

上記のネットにデータを入力してみる。

code:rnn2.py

import torch as pt

import torch.nn as nn

rnn = nn.RNN(input_size = 2, hidden_size = 3, num_layers = 1, bias = False)

W = list(rnn.parameters()) # (1)

x = pt.tensor(1, 2, dtype=pt.float)

y = rnn(x)

# 2 <class 'tuple'>

出力 y はタプルに格納されていた。

code:result2.txt

2 <class 'tuple'>

>> y0

tensor( 0.7731, -0.2384, 0.6260, grad_fn=<SqueezeBackward1>)

>> y1

tensor( 0.7731, -0.2384, 0.6260, grad_fn=<SqueezeBackward1>)

（実験２）input=2, hidden=3, num_layer=1, bias=True

実験１に対してバイアスを与えたときの結果。数式で表すと以下の通り

$ h(t) = f( Wx(t) + \mathbf{b}_W + R h(t-1) + \mathbf{b}_R )

code:result2.py

# W_x

tensor([ 0.1667, -0.1976,

0.5217, -0.4428,

0.0945, -0.0327], requires_grad=True)

# W_h

tensor([-0.2238, 0.3517, -0.3365

0.0069, -0.4170, 0.2224,

-0.0834, -0.2436, 0.4620], requires_grad=True)

# b_x

tensor( 0.2363, -0.1493, 0.5071, requires_grad=True)

# b_h

tensor( 0.2309, 0.5558, -0.2050, requires_grad=True)

このネットは４つの重みを有する。

$ W_x, W_{h}は実験１と同じ

$ \mathbf{b}_xは$ x(t)から$ h(t)への信号に関するバイアスベクトル

$ \mathbf{b}_hは$ h(t-1)から$ h(t)への信号に関するバイアスベクトル

（実験３）リカレント層数を２にする

bias はFalseとする。

$ h(t) = f(W_x x(t) + W_{h1} h(t-1))

code:rnn3.py

import torch.nn as nn

rnn = nn.RNN(

input_size = 2, # 入力される特徴量の数

hidden_size = 3, # 隠れ層のノード数

num_layers = 3, # リカレントレイヤーの数

bias = False,

)

p = list(rnn.parameters()) # (1)

print(p)